The Cramer Distance as a Solution to Biased Wasserstein Gradients
نویسندگان
چکیده
The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this paper we describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third. We provide empirical evidence suggesting that this is a serious issue in practice. Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cramér distance. We show that the Cramér distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences. To illustrate the relevance of the Cramér distance in practice we design a new algorithm, the Cramér Generative Adversarial Network (GAN), and show that it performs significantly better than the related Wasserstein GAN.
منابع مشابه
Demystifying MMD GANs
We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning...
متن کاملLimit Distribution of Distances in Biased Random Tries
The trie is a sort of digital tree. Ideally, to achieve balance, the trie should grow from an unbiased source generating keys of bits with equal likelihoods. In practice, the lack of bias is not always guaranteed. We investigate the distance between randomly selected pairs of nodes among the keys in a biased trie. This research complements that of Christophi and Mahmoud (2005); however, the res...
متن کاملGeneralized Wasserstein distance and its application to transport equations with source
In this article, we generalize the Wasserstein distance to measures with di erent masses. We study the properties of such distance. In particular, we show that it metrizes weak convergence for tight sequences. We use this generalized Wasserstein distance to study a transport equation with source, in which both the vector eld and the source depend on the measure itself. We prove existence and un...
متن کاملDistribution of Inter-Node Distances in Digital Trees
We investigate distances between pairs of nodes in digital trees (digital search trees (DST), and tries). By analytic techniques, such as the Mellin Transform and poissonization, we describe a program to determine the moments of these distances. The program is illustrated on the mean and variance. One encounters delayed Mellin transform equations, which we solve by inspection. Interestingly, th...
متن کاملOn Wasserstein Reinforcement Learning and the Fokker-Planck equation
Policy gradients methods often achieve better performance when the change in policy is limited to a small Kullback-Leibler divergence. We derive policy gradients where the change in policy is limited to a small Wasserstein distance (or trust region). This is done in the discrete and continuous multi-armed bandit settings with entropy regularisation. We show that in the small steps limit with re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.10743 شماره
صفحات -
تاریخ انتشار 2017